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(57) Abstract 



A computer using virtual memory management employs a random-access type storage device such as a semiconductor 
memory for page swapping. The senii conductor memory is formatted to provide multiple partitions of varying block size, e.g., two 
block sizes, for compressed pages, and another block size for uncompressed original -sized pages. The data to be stored is in 
pages of fixed size, and these pages are compressed for storage if the compressed size fits in the block size of pne of the small- 
block partitions in the memory. If a data page is not compressible to one of the small block sizes, it is stored uncomprjessed in the 
other full-size partition. The operating system maintains a table storing the locations of the pages in the partitions, so upon recall 
the page (if compressed) is retrieved from its location found using the table, decompressed and sent to the CPU. The relative 
number of blocks in the partitioned memory (e.g., the physical storage capacity of the memory) is set (or dynamically allocated) at 
the average ratio of compressible pages to uncompressible pages for the compression algorithm used. For example, an algorithm 
may compress 90 % of the pages to either 50 % or 70 % of their original size, so a ratio of the number of locations in the com- 
pressed partitions of the semiconductor memory to the number of locations in the uncompressed partition is selected as 90:10. 
The compression mechanism operates on bytes in bit parallel format, and uses a lookahead buffer which is compared with bytes 
in a window to produce 9-bit symbols. The stream of 9-bit symbols passes through an ECC generator, also operating in bit-paral- 
lel. 
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10 SOLID-STATE RAM DATA STORAGE FOR VIRTUAL MEMORY 

COMPUTER USING FIXED-SIZE SWAP PAGES 

This application is in part a continuation of copending £q>plication Serial 
15 No. 627,722, filed December 14, 1990, by William D. Miller, Gary L. 

Harrington and Lawrence M, FuUerton, for ''Storage of Compressed Data on 
Random Access Storage Devices", assigned to CERAM, Inc., the assignee of 
the present application. 

20 This invention relates to digital data storage and retrieval, and more 

particularly to page-oriented storing of compressed or uncompressed data in 
randomly-accessed locations of fixed sizes in partitioned storage devices. The 
invention is particularly adapted for storing fixed-size pages swapped with main 
memory in a computer system using a virtual memory management scheme. 

25 

A computer implementing a virtual memory system typically employs a 
certain amount of "physical" memory composed of relatively fast semiconductor 
RAM devices, along with a much larger amount of "virtual" memory composed 
of hard disk, where the access time of the hard disk is perhaps several hundred 
30 times that of the RAM devices. The physical memory or "main memory" in a 
virtual memory system is addressed as words, while the virtual "disk memory" is 
addressed as pages. The vinual memory management scheme uses an 
operating system such as UNIX™ along with hardware including a translation 
buffer, as is well known. In multi-tasking operation where more than one 
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program runs at the same time, each rumung in a time slice of its own, each 
program appears to have an entire memory space to itself. To make room in 
the physical memory to mn a new program, or to allocate more memory for an 
already-running program, the memory management mechanism either "swaps" 
5 out an entire program (process) to disk memory or "pages" out a portion 
(page) of an existing process to disk. A typical page size is 4Kbytes. 

Transferring data to and from disk memory is very slow compared to 
the transfer time to main memory, and so "solid state disks" (composed of 

10 semiconductor RAMs like the main memory) have been used as a substitute 
for magnetic disk to improve system performance. This is at a much higher 
cost per megabyte of storage, however, due to the cost of semiconductor 
RAMs. Data compression has not been used because of the variable-length 
record problem as discussed below, Le., compressed data blocks are of variable 

15 size, making random access of compressed "pages" of data impractical. 

As explained in application Serial No. 627,722, data compression 
encoding algorithms are commonly applied to data which is to be archived or 
stored at the tertiary storage level. In a hierarchy of data storage, a RAM 

20 directly accessed by a CPU is often referred to a the primary level, the hard 
disk as the secondary level, and tape (back up) as the tertiary level. The 
characteristic of tertiary level storage as commonly implemented which 
supports use of compression is that the data access is largely sequential. Data 
is stored in variable-length units, sequentially, without boundaries or 

25 constraints on the number of bytes or words in a storage unit. Thus, if a file 
or page being stored compresses to some arbitrary number of bytes this can be 
stored as such, without unused memory due to fixed sizes of storage units. 
Compression can be easily applied in any such case where the data is not 
randomly accessed but instead is sequentially accessed. For this reason, data 

30 compression works well for data streaming devices such as magnetic tape. It 
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has been applied to databases holding very large records on magnetic and . 
optical disks. 

Data compression is not readily adaptable for use with random access 
5 storage devices such as hard disks or solid-state disks, although in many cases 
it would be desirable to do so. The reason for this lack of use of data 
compression is that algorithms for data compression produce compressed data 
units which are of variable size. Blocks of data of fixed size compress to 
differing sizes depending upon the patterns of characters in the blocks; data 

10 with large numbers of repeating patterns compress to a greater degree than a 
more random distribution of characters. Text files and spreadsheet files 
conq>ress to smaller units than executable code or graphics files. This problem 
of variable-length records has made random access of compressed data 
records, as managed by operating systems and controllers in computer systems, 

15 impractical. 

It is the principal object of this invention to provide a low-cost, high- 
speed, semiconductor memory device useful in a computer implementing page 
swapping, as required in virtual memoiy computer architecture, particularly a 

20 device employing data compression to reduce cost, and using error detecting 
and correcting tedmiques to increase reliability. Another object is to provide 
an improved method of storing data in a computer ^tem or the like, and 
particularly to provide a method of compressing data pages for storage in a 
storage medium having an access capability for storing data units of fixed size. 

25 Another object is to provide an improved data compression arrangement using 
a random-access type of storage device, where the data units to be stored and 
recalled are of fixed length and the storage device is accessed in fixed-length 
increments, where the length is small enough for this to be considered random 
access of data. A further object is to reduce the amoimt of unused storage 

30 space in a storage device when compressed data units are stored, and therefore 
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increase the storage density. An additional object is to provide an 
improvement in the cost per byte of storage capacity in a storage device. 

In accordance with one embodiment of the invention, a solid-state 
5 memoiy unit for page-swap storage employs data compression in which 

compressed data partitions are provided in DRAM memoiy for at least two 
different compressed data sizes. Data that will not compress to the block sizes 
specified for compressed data is stored micompressed, in another partition in 
the DRAM memory, for example. As set for in application Serial No. 627,722, 

10 a storage arrangement for compressed data may advantageously use multiple 
partitions, where each partition is a section of available physical storage space 
having an address known to the system which differentiates it from other 
partitions. The data to be stored is in blocks, i.e., units of data of some fixed 
size, as distinguished from byte or word oriented data of variable length. The 

15 partitions are capable of holding multiple blocks, each randomly accessible. 
The data blocks may be compressed if the compressed size fits in the fixed 
block size of one of the partitions in the storage device. To accommodate data 
wfaidi is conq>ressible to a varying degree, yet avoid waste of unused space in 
the partitioned memory device, the partitions are made of differing block sizes; 

20 for example, there may be two partitions, these two having block sizes 

corresponding to the typical compressed sizes of the blocks of data. These 
compressed sizes may be perhaps one-half and two-^thirds the siisEe of the 
original data blocks in a typical situation. Data which cannot be conq[>ressed 
to the two-thirds value or less is either stored in other storage (e.g., the hard 

25 disk) or preferably is stored in a third partition of the memoiy device with 

block size of the original (uncompressed) data. The storage arrangement may 
preferably use a semiconductor RAM array, or it may use a combination of 
RAM and disk as described in the application Serial No. 627,722. 

30 In one embodiment, a data storage device, such as a bank of DRAMs, 

is employed for storing all page-swap data for a virtual memory management 
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system. The semiconductor memory is partitioned into three parts, two of 
these for compressed pages and one for the small percentage of page that will 
not compress to a given size. The two fixed-size compressed block partitions 
are formatted for two different compressed block sizes equal to what a 
S compressed version of the original block size will fit into for the majority of 
cases. One of these partitions is for blocks 50% of the original size, and the 
other for 70% of original, in one nample. The relative number of blocks in 
each partition (e.g., the physical storage capacity of each partition) is set at the 
average ratio of compressible blocks to uncompressible blocks for the 

10 compression algorithm used. By compressible it is meant that the block of 
data can be compressed to the block size of one of the compressed block 
partitions, and by uncompressible it is meant that the block will not compress 
to the required block size to fit in the compressed block partition. It is 
reasonable to select an algorithm that will compress 90% of the blocks to 

IS either 50% or 70% of their original size, so in this case a ratio of the number 
of blocks in the compressed partitions to the number of blocks in the 
uncompressed partition is selected as 90:10. The size of the blocks is selected 
to be some efficient value depending upon the system and the way data is 
handled in the system; for example, the block size is probably best selected to 

20 be the page size of 4Kbytes, or a submultiple of the page size. Although the 
page size is typically 2K-bytes or 4K-bytes in the most commonly-used 
operating systems, other sizes may be appropriate. In the example 
embodiment, the block size of uncompressed data is selected to be 4Kbytes 
(actually 4096-bytes), while compression to 50% would mean one of the 

25 compressed data block size is 2Kbytes (2048-bytes) and compression to 70% 
would mean the other block size is about 2.8-ICbytes. A hit rate of 
approximately 90% may be achieved with this partitioning. The 10% of pages 
found not compressible to the 70% size are stored uncompressed in the third 
partition of the DRAM memory. 

30 
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In one embodiment, a method is provided for collecting statistics on the 
page data being handled, and adapts the partitions to optimize capacity based 
on the kind of data encountered. Thus, the partitioning is adaptive, changing 
according to the compressibility of the page data. 

In an alternative embodiment, instead of using the third partition of the 
DRAM memoiy, the ordinary storage device, such as a hard disk, is employed 
for pages that cannot be compressed to the threshold 70% size. The disk 
storage is used as uncompressed storage, functioning as a partition made up of 
addressable locations of a block size equal to that of the original 
uncompressed data (e.g., page size of 4KB). 

In operation of the preferred embodiment, the computer system sends 
(writes) data in blocks (pages) to the storage device, and before being written 
the data blocks pass through a compression unit which attempts to compress 
the blocks using the algorithm of choice. A counter keeps track of how many 
bytes of pineal storage are required to store the compressed data. If the 
number exceeds the size of the blocks used for physical data storage in the 
larger of the two compressed data partitions, then the actual amount of storage 
required (value in the counter) is returned to the operating system, which 
resends the page to the correct partition, so the data block is written 
uncompressed in the other partition and the addressing information 
maintained by the operating system reflects this. But if the block is 
compressed to the number of bytes of the smallest compressed partition then 
the data is stored in this compressed partition, or if compressed to the size of 
the larger compressed data partition it is stored thus, and in either event the 
location is recorded as such by driver software (added to the operating 
system). The driver records the values in the operating system kernel data 
stmctures which map the page-swap device translations. Upon recall, a 
request from the computer for a given page is checked against these stored 
addresses, and retrieved from the partition where it is found, then, if necessary. 
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decompressed before sending to the computer. The average performance of 
the page swapping operation is greatly enhanced, the pages are stored in much 
faster semiconductor memory. 

5 The performance of the page-swap memory unit as described will 

depend upon the speed of the compression and decompression mechanism. If 
the data conq>ression requires too long, then the speed advantage of 
semiconductor RAMs over hard disk is Ic^t. Therefore, in one embodiment, a 
conq>ression arrangement is employed which operates upon one to four bsrte 

10 segments of data and performs a single-clock compression of this data if a 

match is found. In particular, Lempel-Ziv compression circuitry is employed 
which performs comparisons of all match sizes of a lookahead buffer to all 
positions in a window, for single-clock compression of all matches (one to four 
bytes). A tuned Lempel-Ziv algorithm uses 8-bit symbols, with a 64-symbol 

15 window, and a four-symbol lookahead buffer. This algorithm produces output 
values that are the same bit-width (9-bits) for either match or no-match to 
greatly simpUfy the task of bit-packing the compressed output. The DRAM 
storage for compressed data is arranged in a bit-width (36-bits) that is a 
multiple of the compressed data size (9-bits) to simplify the task of circuitry 

20 which bit-packs the compressed output This compression mechanism is 
pipelined so that one byte is passed every clock cycle. 

An important feature is the use of ECC (error correcting code) to 
maintain data integrity, even though DRAM s with potentially high soft error 

25 rates are employed. That is, the DRAMs may have soft error rates which are 
not acceptable for use in main memory, and may indeed be slower than 
ordinarily used for DRAM storage and have other relaxed specifications; these 
devices arc referred to as "audio grade" DRAMs by some in the industry. The 
data being stored in the memory unit (whether compressed or uncompressed) 

30 passes through an ECC generator circuit to produce a code that is stored with 
each block of bytes, then upon recall the ECC circuit checks the code and 
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makes a correction if a recoverable error is detected. The ECC logic uses a 
BCH code with a 9-bit character size to effectively correct errors on the 9-bit 
compressed data. 

5 Another feature of the invention is the use of high-performance DMA 

interface to the system bus. A FIFO is induded to buffer write data coming 
into the compression unit or read data going from the compression unit to the 
system bus, and when a bus grant is received a burst of data is sent instead of 
just one word. In the interface between the memory controller and the 
10 DRAM memory, a 2-word buffer is employed so that a page-mode read or 
write can be implemented if two words are waiting to be accessed. 

The novel features believed characteristic of the invention are set forth 
in the i^>pended claims. The invention itself, however, as well as other 
15 features and advantages thereof; will be best understood by reference to the 
detailed description of specific embodiments which follows, when read in 
conjunction with the accompanying drawings, wherein: 

Figure 1 is an electrical diagram in block form of a digital system 
20 including a memory for storing pages of data, using features of one 
embodiment of the invention; 

Figure 2 is a more detailed electrical diagram in block form of a data 
conq>ression unit and ECC unit used in the system of Figure 1. 

25 

Figure 3 is a more detailed electrical diagram of the compression and 
decompression circuits in the system of Figure 2; 

Figure 4 is a diagram of the contents of the lookahead buffer and 
30 window buffer in the circuit of Figure 3 for an example of a data input; 



BNSOOCID: <WO_9217B44A1.L> 



wo 92/17844 



PCT/US92/02364 



-9- 



Figure 5 is an electrical diagram of the ECC encoder circuit 45 of 
Figure 2; 

Figure 6 is a diagram of the code word structure at the .output of the 
5 ECC encoder circuit of Figure 5; 

Figure 7 is an electrical diagram of a bit serial encoder used for 
explaining the function of the circuit of Figure 5; 

10 Figure 8 is an electrical diagram of another bit serial encoder used for 

explaining the function of the circuit of Figure 5; 

Figure 9 is an electrical diagram of an ECC decoder circuit 46 used in 
the system of Figure 2; 

15 

Figure 10 is an electrical diagram of a circuit for computing a partial 
syndrome as used in the circuit of Figure 9; 

Figure 11 is an electrical diagram of a bit-*serial S^-calculator used in 
' 20 the circuit of Figure 9; 

Figure 12 is an electrical diagram of a symbol-wide Si-calculator used in 
the circuit of Figure 9; 

25 Figure 13 is an electrical diagram of a bit-serial S3-calculator used in 

^ the circuit of Figure 9; 

Figure 14 is an electrical diagram of a ^mbol-wide S3-calculator used in 
the circuit of Figure 9; 

30 
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Figure IS is an electrical diagram of an ECC error corrector circuit 
used in the system of Figure 9; 

Figure 16 is an electrical diagram of a mapper circuit used in the error 
5 corrector circuit of Figure 15; 

Figure 17 is an electrical diagram of an equality detector circuit used in 
the circuit of Figure 16; 

10 Figure 18 is an electrical diagram of an error detection logic circuit 

used in the circuit of Figure 16; 

Figure 19 is a diagram of the mapping of memory 17 in accordance with 
a dynamic allocation method, according to one embodiment; and 

15 

Figure 20 is another diagram of mapping the memory 17 in accordance 
with the dynamic allocation method. 

Referring to Figure 1, a data compression method according to the 
20 invention is used in a system having a source 10 of data to be stored and 

recalled, and in a typical application this source would be a CPU or the like, 
although various other data sources may use the features herein disclosed. In 
particular, the data source 10 is the CPU of ia workstation or the like, using a 
virtual memory management scheme such as the UNIX operating system 
25 handling fixed-size pages of data (e.g., each page is 4Kbyte). The CPU 10 
employs a main memory 11 coupled to the CPU by a system bus 12, and 
secondary storage 13 is also coupled to the CPU by the bus 12. When the 
CPU 10 has a unit of data to store it is sent by the bus 12 along with 
appropriate addresses and controls in the usual maimer of operating a CPU 
30 with main memory and disk storage or the like. When the imit is a page to be 
swapped, however, as when the CPU 10 is executing an operating system using 
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virtual memory, the page is written to a swap space in secondary storage 13 
(this secondary storage constructed according to the invention taking the place 
of what is usually simply a hard disk in conventional systems). The secondary 
storage includes a disk 14 operated by a disk controller IS for storing files in 
5 the usual maimer (and also for storing uncompressed fixed-size page data in an 
alternative embodiment). In addition, a swap space or swap partition is 
provided, as is usual for UNIX virtual memory management, and according to 
a preferred embodiment this swap space uses a data compression imit 16 along 
with a DRAM memory 17 for storing compressed pages. The data 

10 compression mechanism 16 examines each page of data received from the 
CPU 10 during a page-sw^ operation and determines whether or not 
compression is possible for this page. If compression is elected^ the 
compression mechanism 16 sends the compressed page of data to the memory 
17. The memory 17 contains three storage areas 18, 19 and 20 for two fixed 

IS sizes of compressed page-oriented storage and one size of uncompressed 
storage. The area 18 is of a size for 50% compression and the area 19 for 
70% compression, while the area 20 is for imcompressed pages. If a page can 
be compressed to 70% or less of its original 4Kbyte size in the compression 
imit 16, then the compressed page of data is sent to the memory 17 to be 

20 stored in the area 18 or area 19, depending upon the degree of conq>ressiorL 
On the other hand, if compression to 70% or less is not possible, the page is 
stored in the uncompressed area 20 in a preferred embodiment; alternatively, a 
partition in the disk 14 reserved for page swapping, i.e., for virtual memory 
management, may be used for these uncompressed pages. The operating 

25 system maintains a table indexed by virtual page address giving the location of 
each page, i.e., whether it is present in memory 11, or, if not, which partition in 
swap space it is located in. 

In the example embodiment, using 4Kbyte page size, and employing 32- 
30 bit (4-byte) word width and data bus width in the bus 12, a page is transferred 
on the bus 12 as a IK-word block of data, using a page address (low-order 12- 
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bits of byte address is all zeros). If compressed to 50%, this IKword block 
would be stored in the memory partition 18 in £4>proximately a 512-word block 
(the bits added by the ECC circuitiy woiild add to the size). If compressed to 
70%, the IKword block would be stored in the memory partition 19 in about a 
5 700*word block. If not compressible, it is stored in partition 20 as a 1024-word 
block (plus ECC increment added). The ratio of sizes of the partitions 18, 19 
and 20 may be selected based upon historical empirical data of the 
compressibiU^ of the data for the particular code and data in the task being 
executed. Or, the partitioning may be dynamically altered depending upon the 
10 actual con^)ressibility of the currcntiy executing task. In either event, the 
partition 20 is of a size needed to store only about 10% of the pages. 

The operating system executed by the computer 10 maintains a table in 
memory 11 of the locations of pages. As each page is stored in the memory 

15 17, its location is returned to the operating system. Or, if a page is present in 
physical memory, this is indicated in the page tables. When a memory 
reference is made by the CPU to data in a page not present in physical 
memory, then a page fault is executed, resulting in a page swap. Various 
algorithms may be used to decide which page to swap out to make room for 

20 the needed page in physical memory. An alternative way of operating the 

page swap mechanism of the invention is to mark the uncompressible pages to 
be always present in physical memory, rather than storing them in the partition 
20, in which case there is no need for the partition 20. 

25 In another alternative embodiment, if the hard disk 14 were used for 

uncompressed storage, the mapping of page location could be maintained by 
the secondary storage itself, rather than by the operating system. In this case, 
for recall of page data stored in the memory 17, the system of Figure 1 would 
send a page address (plus controls), and if the disk controller foimd this 

30 address in a table maintained of pages on disk then a disk read would be 

implemented and the page returned to the CPU for writing to main memory 
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11. The controller for the memoiy 17 would also search for the page address, 
in a table maintained locally to determine if the page was stored in the 
partitions 18 or 19 as compressed data, and if so the page would be read from 
the indicated location in partition 18 or 19 and uncompressed. before returning 
5 the page to the CPU via bus 12 in an uncompressed state. Of course, the page 
would be found in either the disk 14 or the memory 17, but not both. The 
CPU 10 would merely send out an address on the bus 12 (e.g., the virtual 
memoiy address) to recall a given page, and would not itself need to keep 
track of whether the page was stored compressed, nor which partition was used 
10 to store a given page. 

With reference to Figure 2, one example of the construction of the data 
compression mechanism 16 and its memory controller of Figure 1 using the 
features of the invention is illustrated. The compression mechanism receives 

15 data from the CPU 10 by the bus 12 which typically would include an address 
bus 12a, a data bus 12b and a control bus 12c all being input to a bus 
interface unit 22. The interface unit 22 provides a DMA interface so that 
fouT'-word bursts of data may be transferred from the system bus into or out of 
the mechanism 16, thereby increasing the overall DMA transfer rate. A FIFO 

20 23 buffers incoming and outgoing data to or from the bus 12. The FIFO 23 is 
four words deep, i.e, has a data width of 32-bits (one word) and is four bits 
deep for every bit position. A biirst-sensing arrangement in the interface 
controUer 24 generates a Bus-Request for control bus 12c whenever at least 
one word is in the four-word FIFO 23 ready to transfer to the bus 12; when 

25 the CPU responds with a Bus-Grant on control bus 12c, the FIFO is checked 
to see how many words are ready for transfer, and produces a burst of that 
many words (up to four) onto the bus 12. If only one word is ready, then of 
course only one word will be transferred. Similarly, up to four words caii be , 
accepted in a transfer from the CPU 10 to the unit 11 via the bus 12; 

30 depending upon how many words are in the FIFO (not yet processed) and how 
many words the CPU has ready to send, up to four words can be transferred in 
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a burst, under control of the controller 24. After transfer into the FIFO 23,. 
the data is fed one byte at a time to a compression mechanism 26 via bus 27, 
using an 8-bit output buffer 28. The buffer 28 is loaded from the FIFO by a 1- 
of-4 selector 29 which selects one of the four bytes of a word, .and by a l-of-4 
5 selector 30 which selects one of four words in each four-deep bit position 31. 
A counter 32 operated by a clock 33 (e.g., the system clock for the CPU 10) 
controls the selectors 29 and 32, This same clock controlling the compression 
mechanism 26 and other parts of the mechanism 16, as well The controller 24 
may respond to commands on the bus 12 in a manner similar to a disk 

10 controller, i.e, the CPU 10 sends commands and data by writing to registers in 
the bus interface controller 24 using the buses 12a, 12b and 12c in the I/O 
space of the CPU. The CPU 10 may send a page to be stored in a format 
including commands on control bus 12c an address field on address bus 12a, 
and a 4Kbyte data field in bursts of four 4-byte words. An important feature 

IS of the construction of the FIFO 23 is that it uses shift register cells instead of 
flip-flop (static) cells or a RAM array, which allows the FIFO to be 
implemented in a much smaller number of gates in a gate array. 

Data to be stored, received from the CPU 10 via FIFO 23, is directed 
20 by the 8-bit bus 27 to the data compression mechanism 26, one byte each clock 
cycle. The data compression mechanism 26 may be of various types of 
construction, and serves to accept fixed-length segments of data from bus 27, 
one byte at a time, and to ultimately produce variable-length pages of output 
data on output bus 34. The compression method used is preferably a unique 
25 implementation of the so-caUed Lempel-Ziv^e as described by Ziv, J. and 
Lempel, A., "Compression of individual sequences via variable-rate coding", 
IEEE Trans, on Information Theory, Sept. 1978, pp. 530-536, or other 
improvements may be used such as described by Terry A. Welsh, "A technique 
for high-performance data compression", IEEE Computer, June 1984. The 
30 mechanism 26 may be a processor itself, executing code, or may be a 

sequential state machine, or preferably a logic network construaed of gate 
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arrays. The amount of compression of a given page of data will depend upon 
the degree of repetitiveness of patterns of characters in the page, since the 
compression tedmique is based upon the concept of substituting a shorter code 
symbol in place of a sequence of bytes which has appeared before in the page. 
5 In the improved implementation, a window is examined in a 64X8*bit register 
35 using a 4X8-bit lookahead buffer 36, and the one-clock compare logic 37 
produces a match output 38, a match address output 39 and a match length 
output 40, in one clock cycle. This compression mechanism operates upon one 
to four byte segments of data and performs a single-clock compression of this 

10 data if a match is foimd. The Lempel-Ziv compression circuitiy employed 
performs comparisons of all match sizes of the lookahead buffer 36 of all 
positions in the window 35, for single-clock compression of all matches (one to 
four bytes). A tuned Lempel-Ziv algorithm uses 8-bit symbols, with a 64- 
symbol window 35, and a four-symbol lookahead buffer 36. This algorithm 

15 produces output values on the bus 34» each clock cycle, that are the same bit- 
width (9-bits) for either match or no-match to greatly simplify the task of bit- 
packing the conq>ressed output This compression mechanism is pipelined so 
that one l^e is passed every clock cycle. 

20 The 9-bit data on bus 34 is applied to an ECX: generate circuit 45, to 

produce a code that is stored with each block, then upon recall an ECC 
detect/correct circuit 46 checks the code and makes a correction if a 
recoverable error is detected. The ECC logic uses a BCH code with a 9-bit 
character size to effectively correct errors on the 9-bit con^)ressed data, one 9- 

25 bit byte at a time, as it comes in on the bus 34. A linear feedback shift 

register, with the 9-bit symbols in bit-parallel, receives the incoming data and 
generates the ECC code on the fly in the generator 45. 



30 



The DRAM controller 49 receives the 9-bit data from bus 47 into a 9- 
bit buffer 50, and loads this data into a word-wide two-word buffer 51 via a 
seleaor 52. The buffer 51 is 4X9-bits wide (one word) and two words deep, so 
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there can be two 36-bit words ready to apply to the 36-bit wide memory bus S3 
at any given time. The memory 17 is configured as two banks 55 and 56 of 
DRAM devices, preferably 4-Meg, 16-Meg or larger for each memoiy device. 
Note that the DRAM storage for compressed data is arranged- in a bit-width 
5 (36-bits) that is a multiple of the compressed data size (9-bits) to sioq>iify the 
task of circuitry which bit-packs the compressed output. If the depth of the 
partition 18 were 512-words, a DRAM having 2048 columns would fit four 
pages in one colunm width. Then^ if the partition 19 was about 70% of a 
1024-word page size, three pages would fit in a column. The object is to fill 

10 the DRAM with a minimum of unused memoiy area, for maximum economy. 
The two-word buffer 51 along with sensing logic in control circuitry 54 
responsive to the content of the buffer provides a page-mode write, thereby 
decreasing the effective memory cycle time; when one word of data is ready to 
be written to the DRAMs, a write Qrcle is im*tiated (involving RAS going 

15 active and write-enable going acdve), and if a second word of data is in the 
buffer 51 ready to be written before the write operation is completed (before 
CAS goes inactive-high) and if the second word is to be written to the same 
page, then the normal single-word write is changed to a page-mode write (CAS 
goes inactive-high while RAS stays active-low, then CAS goes active-low again) 

20 and both words written in one RAS cycle. The controller 13 generates a single 
set of RAS and CAS strobes on lines 57 and 58 for each two banks 55 and 56 
of DRAMs, along with separate sets of write-enable and output-enable 
controls on lines 59 going to the two banks. This economizes on the number 
of output pins needed for the gate array used to construct the DRAM 

25 controller. 

The operations in the system of Figure 2 are pipelined, so in a single 
clock cycle a number of operations are taking place. In a compression 
operation, a data byte is transferred from the FIFO 23 into the lookahead 
30 buffer 36 via buffer 28 and bus 27 in a clock cycle, so a new byte is added to 
the lookahead buffer in each clock. A comparison is made of a byte in the 
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lookahead buffer 36 with the 64-byte sliding window 35 and a match indication 
or symbol vector, or non-match character, is generated in each clock cycle. A 
symbol vector or non-match character is presented to the ECX^ circuit via bus 
34 in a clock cycle (some cycles are skipped when a match is pending), and a 
5 9-bit ECX; symbol is presented to the buffer 50 via bus 47 in a clock cycle. A 
36-bit word is available in the buffer 51 every four dock cycles for storage in 
the slow DRAM memory 17. In a decompression operation, a 36-bit word 
(multiple ECX^-based qrmbol vectors and/or characters) is transferred from the 
slow DRAM 17 to the buffer 51 so that a 9-bit symbol is available to the ECC 

10 circuit via buffer 50 and bus 47 in each clock cycle. The ECC detection is 

carried out in the detection/correction circuit 46 at the rate of one symbol per 
clock. In the decompress circuit 60, which uses the same lookahead buffer 36 
and sliding window 35, a 9-bit symbol is converted or a non-match character is 
transferred to the sliding window in a clock cycle. And, data is transferred 

15 from the sliding window 35 to the FIFO at one byte per clock, to on average 
the transfer rate to the system bus (at four bytes per word) is one byte per 
clock or one word per four clocks. 

Referring to Figure 3, the compression circuitry in the mechanism 16 
20 includes a four-byte lookahead buffer 36 and a sixty-four byte window buffer 

35, with the data from the buffer 28 being clocked into the right-hand end, one 
byte each dock cycle. The compare circuitry 37 checks to see if any two-byte, 
three-byte or four-byte sequence in the window 35 is the same as what is in the 
lookahead buffer 36, and, if so, substitutes the address (in the range 0-63) of 
25 the beginning of the sequence and the length (2-, 3- or 4-bytes) of the 

sequence, in place of the data itself, as the address 39 and the length 40. 
Thus, a 9-bit output data value is sent to the ECC circuit from the compare 
circuit that is either (1) a value of the format 61 seen in Figure 3 which has a 
field 62 which the same as the input data byte with a 9th bit in field 63 that 
30 indicates no-match, or (2) a value of the format 64 which includes a 6-bit 
address field 39 and a 2-bit length field 40 indicating how many bytes are 



BNSOOCID: <W0 ^9217e44A1_L> 



wo 92/17844 



-18- 



PCT/US92/02364 

i 



matched, along with a 9th bit in field 63 that indicates "match data**. In a clock 
cycle, the byte in position A of the lookahead buffer 36 is compared with all 
64-bytes of data in the window 35 (bytes 0-63) by a set of sixty-four compare 
circuits 67, one for each position of the window 35. If no compare is found in 
5 any of the sixty-four compare operations performed, the original character is 
sent out as the 9-bit format 61, the data shifts one position to the left in 
window 35 and buffer 36, a new byte enters at position D of the lookahead 
buffer 36, and another 64-position compare is made of the byte now in position 
A. If a compare is foimd, a hold is set in a flip-flop 68 for this position (0-63) 
10 of the Mondow 35, and a left shift is executed and a new byte of data enters the 
right-hand position D of the lookahead buffer. Another byte is now in the 
position A (the byte that was previously in position B) and another compare is 
made by the sixty-four compare circuits 67. If another compare is found, this 
means that there are two adjacent bytes that are identical to two adjacent 
15 bytes in the window. The liold" condition previously set in the flip-flop 68 is 
held, using the AND gate 69, as will be explained, and another left shift is 
executed in the next clock, with another new byte entering position D. 
Another compare is performed by the sixty-four circuits 67, and this continues 
up to four compares. That is, the maximum number of identical bytes to be 
20 found is four, and these four are replaced with one 9-bit value of format 64 in 
the data stream sent to the BCC circuit via bus 34. So, if a 4Kbyte page of 8- 
bit data was sent to the compression circuit 26 composed of all identical values 
(e.g., all zeros) the maximum compression produces 1024 9-bit characters or 
symbols of format 64, or to 28% of the original. During clocks where a match 
25 has been found, the output on bus 34 is a null or no-op, so there are gaps of 
one, two or three clock cycles when no output value is placed on the bus 34 
from the compression circuit. 

Each one of compare circuits 67 has two 8-bit inputs 70 and 71, with the 
30 input 70 being the contents of one of the window 35 positions (bytes 0-63), and 
the other input 71 being the contents of the position A of the lookahead buffer 



BNSDOCID: <WO_9217B44A1.l_> 



wo 92/17844 



PCr/US92/02364 



-19. 



36. If the contents are identical, an equality indication is produced at an 
output 72. This output is applied to the NAND gate 69, and the output 73 of 
the gate is applied to the flip-flop 67. The flip-flop is initialized to a "match" 
state after each data value is outputted to bus 34, so the feedback via line 74 
5 from output 75 allows the AND gate 69 to pass the match output 72 if it is 
high, indicating a match for the present compare. So, on the first dock, the 
byte compare is ANDed with the flip-flop output 75 to indicate if there is 
another match, and, if so, that state is latched (maintained) in the flip-flop by 
the input 73. All of the sixty-foiu^ flip-flops 68 at positions where there is no 

10 match will be switched to the "off* condition, so they wiU no longer be in 

contention for a multiple-byte match. After four clocks, only the flip-flops 68 
of the sixty-four that have registered four matches in a row will still be "on". 
The sixty-four flip-flop outputs 75 go to a priority encoder 76 to generate on 
lines 77 the 6-bit address 39 of the lowest-niunber first byte of a four-byte 

15 match, i.e., the match closest to the lookahead buffer 36. After the match 

address is sent out as a format 64 match symbol, all flip-flips 68 are initialized 
to the starting state for a new compress cycle. If, after only three l^es have 
matched, the fourth byte shows no more matches, the current outputs of the 
flip-flops are immediately sent to the priority encoder 76 to generate the 

20 address of the 3-byte match. Similarly for 2-byte matches. A line 78 connects 
the output 73 of each of the sixty-four NAND gate 69 to a 64-input OR gate 
79 to produce an output 80 to indicate there is a match somewhere in the 
window 65; so long as the output 80 is high, a new compare cycle is not 
started. The number of cycles that the output 80 stays high is used to generate 

25 the length field 40 of 2-, 3- or 4-bytes. 

Instead of the compare circuit shown, where only one of the bjrtes of 
the lookahead buffer 35 is compared to the window 35 each cycle, the entire 
four bytes of the lookahead buffer can be compared in a single "gang compare" 
30 cycle. In such a case, if a four-byte compare was detected, the symbol 64 

would be sent out as described above, then there would be three no-op cycles 
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where no compare is done while new data is shifted into the lookahead buffer. 
The one-byte at a time compare as described above requires up to 75% fewer 
gates in a gate array for implementation, however, and still maintains the 
throughput speed of one clock per byte input Also, the number of bytes in 
5 the lookahead buffer 36 could be increased, but this would require a larger 
format 61, 64 for the output symbols. If the nimiber was eight, the field 40 
would be 3-bits, for example. Likewise, an increase in the number of bytes in 
the window 35 is possible, and again would require an increase in the number 
of bits in the address field 39. 

10 

An example of a compression sequence using the circuit of Figure 3 is 
illustrated in Figure 4. The input string in the example is the text 
THIS JS_TEST_A_THD[S_IS_TEST_B_ . . No match is found until after 
the fifth clock cycle, then a 3-byte match is found for "ISJ at address "02" 

15 which is clocked out on the sixth cycle. Then no match is found imtil dock 
seventeen where " JF is matched at address "OT and clocked out at the 
seventeenth cycle. Four-byte matches are found to be clocked out at cycle-18, 
-22, -26 and -31. In this example the number of input bytes is thirty-six» and 
the number of output symbols is twenty, providing a compression to 623% of 

20 original size. 

The decompression operation uses the same 64-byte window buffer 35 
as is used for compression. The 9-bit symbols from the bus 34 are loaded, one 
each clock cycle, to the buffer 82, and the ninth bit field 83 is used to 

25 determine whether the symbol is compressed or not. If not compressed the 8- 
bit field 62 which is the original character is loaded to byte-0 of the window 35 
via lines 84. If compressed (field 83 a "1") the address field 39 is applied by 
path 85 to a l-of-64 selector 86, while the number field 40 is applied via path 
87 to a selector 88 which picks 2-, 3- or 4-bytes starting at the location selected 

30 by the selector 86, and feeds this value back to the byte-0 position via path 89 
and lines 84, as the contents of window 35 are clocked to the left according to 
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the number in field 66. Decompression thus proceeds at the clock rate, one 
byte per clock. The decon^ressed data is applied to the bus 27 at the same 
time it is shifted into the byte-0 position, and consists of the reconstructed 
original data. 

5 

Referriqg to Figure 5, the ECC encoder circuitiy 45 is iUustrated. The 
input from the bus 34 is a series of 9-bit data words as generated by the 
compression circuiL The output to the bus 47 from the ECC encoder 45 is a 
sequence of fifty-six 9-bit data words as iUustrated in Figure 6. An input of 
10 485 data bits (in 53 9-bit symbols plus 8-bits) has nineteen parity bits added to 
it to produce the output sequence of 56 9-bit words, a total of 504 bits, 
labelled to in Figure 6. This is referred to as a (504, 485) code. 

The generator polynomial used in the encoder of Figure 5 is 
15 g(x) = m^x) • mi(x) • m3(x) 

where 

mo(x) = X + 1 
mi(x) = x' + x^ + 1 
m3(x) = x^ + x^ + x^ + x^ + 1 
20 The polynomials are from Table C2 of Peterson & Weldon, "Error-Correcting 
Codes*, MTF Press (1972). Multiplying these polynomials gives 

g(x)=x*'' + x" + x**^ + x^ + x^ + x^ + x" + x^^ + x' + x^ + x^ + + X +1 

25 Encoding is performed by dividing the shifted message polynomial 

x^^m(x) by g(x) and appending the remainder p(x) to x*'m(x) to form a code 
word. That is, if 

g{x) ^ g{x) 



BNSDOCID: <WO_9217844A1_l_> 



wo 92/1 7844 PCT/US92/02364 

-22- 



Then 

c(x) = x^'m(x) + p(x) 

Operation of the encoder circuit of Figure 5, which functions in bit- 
5 parallel, is best explained by first referring to Figure 7, where a conventional 
bit-serial feedback shift circuit is shown whidi would encode this (504, 485) 
code. This is an encoder for g(x), using premultiplication by x^^ The serial 
input data m(x) enters at input 75 and exits at output 76, and is also applied to 
the feedback loop 77 via gate 78, where the last 19-bits of a 504-bit series are 

10 masked by an input 79. The sequence of gates 80 and delays 81 results in an 
output of nineteen parity bits on line 82, following the 485-bit serial data on 
line 76. Figure 8 shows a circuit which performs exactly the same function as 
the circuit of Figure 7, but uses premultiplication by x*®. In operation the 485- 
bit message polynomial is shifted into this register at input 84 and path 85, and 

15 simultaneously shifted out to the channel at output 86, then the register is 

shifted once with a zero input. During this time the encoder outputs the high- 
order parity bit (in position x^^) at output 87. At this point the feedback 
connection via gate 88 is disabled and the last 18-bits of the parity check 
polynomial are shifted out via output 87. 

20 

Figure 5 shows a circuit which i>eif orms the same function as the 
encoder of Figure 8; it differs in that it inputs nine data bits at a time, in bit 
parallel, accepting the symbols of format 61 or 64 from the compression 
mechanism 26. In operation, the encoder circuit of Figure 5 receives on nine 

25 parallel lines 84 as input the 485 data bits with a single trailing zero, as fifty- 
four 9-tuples or 9-bit symbols. This data is also output on nine lines 86 (only 
one shown). After the last 9-bit symbol (which contains the trailing zero) is 
input the encoder contains the nineteen parity checks (to be bits x^ - x^' of 
Figure 5). The high-order parity check bit x^' is then output on line 89 and 

30 substituted for the trailing zero in the last data symbol, and the remaining 

eighteen parity checks are outputted as two 9-tuples on lines 87. The resulting 



BNSOOCID: <WO 9217844A1J_> 



wo 92/17844 



-23- 



PCT/US92/02364 



code word, which consists of fifty-six 9-bit symbols, is depicted in Figure 6. 
The two 9-bit registers holding po to pg, and p9 to p,7, are each made up of 
nine flip-flop circuits. 

5 The truth table of the circuit Me in Figure S is given in Table 1. This 

table indicates which of the nine inputs + f,o» + fn, nig + f^T. f^ 
must be XORed to produce each of the nineteen outputs f^ ... f^g. For 
example, the right-most column of the table shows that 
fig = (mi + Pio) + (nig + P17) 
10 whUe the left-most column shows that 

^ - (mi + pio) + (mj + pii) + (mj + p^^) + pjg 
Because the code has nineteen parity check bits, the high-order parity 
check bit (f^g in Figure 5) must be inserted into the low*order position (m^) of 
the last data symbol. Thus the encoder of Figure 5 must be augmented by a 
IS 2:1, 1-wide, mux (accepting the outputs 87 and 89) to implement this 
substitution* 

The ECC decoder 46 is illustrated in Figure 9. This circuit receives the 
fifty-six 9-bit words from the DRAM via bus 47 and produces a 9-bit wide 

20 output to bus 34, with fifty-six 9-bit input words producing an output of 

53+8/9 9-bit words. The fifty-six symbols go into a buffer 90, and if no error 
is detected, the data is shifted out beginning right after the S6th symbol shifts 
into the buffer 90. The nineteen parity bits are stripped off before the data is 
shifted out. The will be no errors in the vast majori^ of ECC blocks shifted 

25 in; errors will occur only once in hours of operation. 

The first step in decoding the received word r(x) in the circuit of Figure 
9 is to compute three partial syndromes in the syndrome circuits 91, 92 and 93. 
These are 

30 So = r(l) = e(l) = 0 if 0, 2, 4, ... errors 

= 1 if 1, 3, 5, ... errors 
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=r ( a ) =e ( c ) -Ei.iX^ 



S3=r(«3)=e(«^)=Er.iJci 

Here the x, are the error locations and v denotes the number of errors. If v : 
0 then So = S, = S3 = 0. Note that So is a 1-bit quantity while Sj and S3 are 
5 9-bit elements of GF(2'). 

In the event there is only one error, then 
So = 1 and Si' + S3 = 0 
In this case the location of the single error is given by Xj = S^. 

10 

If two errors occur these partial syndromes are related to the 
coefficients of the error locator polynomial 

2:(x) = (x + x,)(x + = ;^ + ff,x + 
by the key equations 
15 So = 0 

S, = a, 

S3 = Si^ai + S1O2 



20 



These equations can be solved to give 



a, = S, 



Factoring Z(x) gives its two roots x, and Xj. With and 3^ known, 
decoding is complete. 
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When three errors occur 

So = 1 Sj^ + S3 =?fc 0 

This coxnbination of events can always be used to detect a three-error pattern. 
Some odd-weight patterns with five or more errors will also be. detected in this 
5 way. 

The ECC partial syndrome calculators 91» 92 and 93 are circuits for 
computing the three partial syndromes Sq, S^, and S3. Figure 10 shows the Sq- 
calculator 91, which is a circuit that simply computes the sum (modulo 2) of all 
10 of the 504 bits of the received word. Figure 11 shows a bit-serial circuit 92 
which computes 

Si = r(a) = ro + r^a + r^o^ + ... Tsafi^ 

Figure 12 shows a symbol-wide circuit 92 which performs the same 
15 function on nine bits rg - rg as Figure 11 does on one bit The operation of 
this circuit will be explained and the fimcdon performed by the circuit 
defined. 

Denoting the present state of the bit-serial circuit by P - (pg P? P6 — Po) 
20 - c^, and inputing the polynomial r(x) ~ r^x^ + ... + ro, after nine shifts the 
register will contain 

= i(c^a + rg)a + Tj)a + r^)a + ... + ro 
= aP*' + T(a) 

Thus if the circuit M | in Figure 12 is designed so that it multiplies its 
25 input (Po to pg) by a^ this circuit 92 will perform the same calculation as that 
of Figure 10. 

If the contents of the 9-bit register 95 are presented as 

P(x) = Pgx* + PtX^ + + Po 
30 then to multiply by we must simply calculate 

a^p(x) = pga*' + pja'^ + ... + 
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The truth table of an a^-multipiier is shown in Table 2, After the S6tb clock 
(S6-symboI input) the register 95 contains the 9*bit Si syndrome, which is 
ouq>ut to the error corrector 96. 

5 Figure 13 shows a bit-serial circuit 93 which computes 

S3 = r(flP) = To + T^a^ + + ... + TyjjO^ 

If the present state of this circuit is denoted by = (pg, Pt, .... po) then 
after nine shifts the register 97 will contain 

a'- «^ + rg)^^ + Tj)c^ + ... + To 

10 = aP*^ + r(cP) 

Figure 14 shows a symbol-wide syndrome generator circuit 93 which performs 
the same function as the serial circuit of Figure 13. Table 3a shows the tmth 
table of the circuit M3 (which multiples by o^), while Table 3b shows the 
corresponding table for the circuit (which evaluates r(x) at o^). The 

15 register 97 made up of nine flip-flops contains the S3 syndrome after the 56th 
clock. 

Referring to Figure 15, the error corrector 96 in the decoder of Figure 
9 is shown in more detail. The circuit performs the functions necessaxy for 

20 decoding as explained above. In operation, as the received word enters the 
decoder, the three partial syndromes are calculated in the circuits 91, 92 and 
93. After the last 9-bit symbol enters, these three syndromes are checked for 
all zeroes (by a controller, not shown). If they are all zero, the data block is 
assumed to be correct and is outputted from the data buffer 94 to the bus 27. 

25 If any of the partial syndromes are non-zero, then the error-correction process 
explained below is executed. In this case, data flow is halted until correction is 
complete. The likelihood of an error is very small, probably not occurring 
more than once every few hours in typical operation, so the performance 
penalty due to data correction is virtually zero. 

30 
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The first step in the error correction process is the computation of 
S3/S,'. Note that 

s,-' - = ((s,»")y 

and that 

5 S,"' = S,» Si^- S,*- S,« S,»«- S,32- S « 

Thus Si"^ can be calculated by an appropriate sequence of squarings and 
multiplications. Table 4 shows the sequence of steps involved in this 
calculation, using the F and G registers of Figure IS, along with a squaring 
circuit 98 and a multiply circuit 99, and 3:1 selectors 100 and 101. After this 
10 sequence of steps is performed, S3 is multiplied by in the multiplier 99 
and the result stored in Register G. This circuit 102 for performing the 
fiinction of generating SgSf' would ordinarily be done in a look-up, using a 
ROM for storing the values; to reduce the area required in constructing the 
circuitiy, the functions of Figure IS are implemented. 

15 

Adding 1 (Le., inverting the lower-order bit) of this result gives the 
quantity 

S3 

A = 1 + 

20 Sj* 

used in factorization of the quadratic L(Z). 

Here the 8-tuple A* on line 103 is formed by deleting the lower-order 
25 bit of A and then nmltiplying A* by the matrix (Mg*)'' in a circuit 104. The 
resulting 8-bit product, with a 0 appended In the lower-order position, is the 
quantity Zj on line 105. Then Zj is multiplied by a,(= S,) in multiplier 106 to 
give the first error-location number Xj on line 107; next the second error- 
location number is calculated in exclusive-OR circuit 108: 
30 X2 = X, + Si 

At this point both error locations are fed to the mapper 109, as inputs X, on 
lines 107 and Xj on lines 110 (both 9-bits wide); the mapper 109 is shown in 
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detail in Figure 16. Basically this mapper circuit generates all possible error- 
location numbers from to in descending order; this circuit function 
could also be accomplished by a look-up using a ROM containing these values, 
but is preferably implemented as the circuit of Figure 16 for construction in a 
5 gate array. The mapper circuit of Figure 16 computes these location numbers 
in sets of nine; first it computes o^, o^, ...^ c^^ using the register 111 
initialized to and the feedback of via path 112; then it con9>utes a^^, 
a^, ..n a^, etc. At each step it compares X| and X2 to these numbers using 
the compare circuits 113 receiving the computed numbers on lines 114, and, if 
10 it finds a match, it outputs a 1 on lines 115 in the appropriate bit of the 
correction symbol. Figure 17 shows a circuit which can be used for these 
comparison circuits 113, employing an exclusive-OR circuit 116 and a NOR 
gate 117. 

IS The process of computing the error location numbers a^, o^, etc., 

involves multiplication by various powers of a, as shown as multiplier circuits 
118 in Figure 16. Table S shows the matrices for the nine fixed-element 
nmltipliers 118 used in Figure 16. 

20 This completes the correction process. It is necessary also to consider 

the task of detecting error patterns which are not correctable, ix., error 
detection. In addition to correcting one or two bit errors per block, the code 
can detect all triple-error patterns and about three-quarters of higher weight 
patterns. Errors are detected in two ways: 

25 Condition 1) Inconsistency between partial syndrome So and number of 

errors corrected. 

Condition 2) A = 1 + {SJS^) does not satisfy. the constraint ao = a5. 

To check Condition 1 it is necessary to count the number of corrections 
30 made by the error correcton The implementation of the error counter 120 of 
Figure 15 is straightforward, merely counting the output 115 of the mapper 



BNSDCXID: <WO_9217844A1J_> 



wo 92/17844 



-29. 



PCT/US92/02364 



109. The error detection logic 121, seen in Figure 18, is responsive to the Sq 
output from the generator 91, the output 122 from the error counter 120, and 
one bit of the A output from the circuit 102. Basically the partial So s^drome 
must be 0 if an even number of errors is corrected and 1 if an odd mmiber. 
5 Table 6 lists the six possible combinations of Sq and the Error Count which can 
occur. The table shows that Condition 1 is easily detected with a single XOR- 
gate 123. 

Checking Condition 2 is straightforward; if - 85 an uncorrectable 
10 error has been detected. Figure 18 shows a logic circuit which can be used to 
generate the detection flag on output 124 in the error corrector 121 of Figure 
15, iising the two exclusive-OR circuits 123 and 125, and an OR gate 126. The 
detection flag is used by the system, i.e., the CPU 10, as status to evaluate 
what action to take; usually a fault would be generated using an interrupt 
IS when an uncorrectable error is detected. 

Another feature of one embodiment of the invention is the dynamic 
allocation of partitioning of the memory 17. The relative sizes of the partitions 
18 and 19 in the memory 17 are chosen to fit the data being compressed. This 
20 may be done on an empirical basis, using the history of compressing page data 
for a particular task or application. According to another embodiment of the 
invention, the partition sizes can be calculated on a <fynamic basis using the 
currently executing pages. 

25 The logical size of a block of data handled by the system described 

above is a page, which is typically 4096 8-bit bytes. Usually the system 
addresses data in memory 11 by 8-bit bytes or 32-bit words. The compression 
system of Figure 2, however, handles data in components to the right of the, 
compression mechanism 26 in 9-bit bytes or 36-bit words. The memory 17 is 

30 purely a block addressable device, and, with data compression and ECC, 
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bandies 4096*byte pages (after compression and ECC bits added). The ECC 
logic maintains 56-bit words. 

The maYimiim compression capability of the system of Figure 2 is to 
5 replace four 8-bit data bytes on bus 27 with one 9-bit data symbol on bus 34. 
The ECX: circuit generates onto bus 53 a 56-symbol block (Figure 4) for each 
53+ (9-bit)-symbols on bus 34. This ECC output is rounded to the next higher 
56-symbols (a symbol is a 9-bit "byte"). Thus, for a system page size of 4096 8- 
bit data bytes, the effective page size in 9-bit ECC symbols is 

10 (4096/4)*(56/53), rounded up to next 56 = 1120 (9.bit ECC symbols) 

This is equivalent to twenty (i.e., 1120/56) ECC blocks. Since bits are added 
to the data being compressed and ECC parity bits are added, a page that is 
totally imcompressable would have one bit added to each byte in the 
compression mechanism 26 and expanded by a factor of 56/53 in the ECC 

15 generator 45 (plus round up), so 4869-1' bytes theoretically would be produced 
for storing in the memory 17. The driver used to manage the system of Figure 
2 assures that only pages that compress to below certain thresholds will be 
stored in partitions 18 and 19, however. Those pages that expand are simply 
marked in error and not stored. Since system pages compress to various sizes, 

20 a method is provided to manage the data as variable length records. 

Compression runs on ^ical data found in swap operations in a virtual 
memory system provide the basis from which certain enipirical values can be 
choseiL The driver used with the system of Figure 2 manages the swap 

25 memoiy 17 as a "house". The bouse contains rooms 20-through-65, divided 
into floors to best accomplish the task of managing variable length pages. 
Pages which expand to greater than sixty-five ECC blocks are said to be 
evicted. For each device of Figure 2, a separate data file is maintained which 
is mapped into kernel space during execution so that continuously updated 

30 data points may be more efficiently accrued. If the device's data file (known 
as a tenants file) did not at first exist, then one is created during the device 
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open and filled with the data points obtained from software simulation runs, 
(known as applicants). 



The tenants correspond to the nimiber of pages whidi compress to fit in 
5 a particular room. The percentage of potential fits in a room to the total 
number of rooms is then derived for each tenant, as well as their effective 
compression ratios. The values BASEMENT and ATTIC are established to 
represent the limits on rooms within the house; popularity establishes the 
percentage of hits on those rooms; and efficiency establishes the effective 
10 compression percentage for each room. 



#define EASEKfENT 20 

#define ATTIC 65 

#define ROOF ATTIC + 1 

15 

double tenants[ROOF*BASENfENT]; 

double popularityfROOF-BASEMENTl; 

double effidency[ROOF-BASEMENT]; 

20 leases - Q; 

for (room = BASEMENT; room < ROOF; room+ +) 

leases + ^ tenants[room-BASEMENT); 
for (room = BASEMENT; room < ROOF; room+ -f ) 

{ 

25 popularity[room-BASEMENTl « 

tenants[room-BASEMENT] / leases; 
e£Bciency[room-BASEMENT] « 

room • (9/8) • (56) / 4096; 

} 

30 

The tenants array always exists and is continuously updated; however, 

the popularity and efficiency arrays are calculated once on the initial open of 

the device and are merely place holders until the house is constructed. The 

floorplan of the house consists of the following: 

35 straa floorplan 

{ 

long screen; 
long window; 
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long door; 

long front; 

long back; 

long area; 

5 double efficiency; 

double popularity; 



}model[ROOF-BASEMENil.houscIROOF-BASEMENT|; 

The bouse was decided to have four floors. Obviously, the noiore floors, 

10 the more efifecdve the storage; however, the program used to size the floors 

would take longer to run and a desire is to maintain a large first floor for a 

better first hit percentage. Therefore, a small change in compression is 

allowed to facilitate a larger change in space. 

floors ~ 4; /• number of floors */ 

15 stax = 0.020; /* size tax (compression delta) •/ 

wtax = 0.080; /* weight tax (population delta) */ 

The house is constracted using the values above to find the floors which 
provide the most compression (plus or minus stax) given the largest floor space 
20 (plus or minus wtax). All combinations of floor space are analyzed, from... 

20- 20 

21- 21 

22- 22 

23- 65 

25 

to... 

20 - 61 
62-62 

30 63 - 63 

64-65 



First, the popularity of all the rooms on a floor are summed and stored 

in the house. A door is opened between the two rooms joining each floor, 

35 while the effective capacity of each floor is computed and summed with that of 

the previous floor. Suppose the following floors are being analyzed: 

20-24 
25-36 
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37-46 
47 . 65 

The popularity of rooms 20 through 24 are summed and stored in the 
S house beside the room which divides the floors (room 24). The same is 
performed for the other three floors, saving the most popular floor. The 
combined effective compression is computed by summing the products of each 
floor's efiGciency and popularity. The overall effective capacity is thus the 
physical edacity of the device divided by the total effective compression. The 

10 weight of the house is computed as the popularity of the largest floor divided 
by the total effective compression. Since the goal is to achieve a house with 
the smallest possible total effective compression and the largest floor, a small 
increase is allowed in compression for every large increase in population on 
the most popular floor. An example of a constracted house's floozplan might 

IS be as in Figure 19. 



Hie code sequence which accomplishes all this is as follows: 

scap s 1.; 
scap = 0.; 

20 blueprint(BASEMENT,floors,ATTIC,stax,wtax); 



blueprint(room, floors, attic, stax, wtax) 
25 long room; 

long floors; 
long attic; 
double stax; 
double wtax; 



30 { 



35 



long a,b; 

double popi 

double iscap; 

double iwcap; 

if (floors < 2 I I floors > attic - room) 
return; 

for (a = room+1; a < attic; a++) 
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10 



15 



20 



25 



30 



35 



BASEMENT) 



BASEMENT); 



pop = 0.; 

for (b = a; b > room | | b = = BASEMENT;) 

pop + = popularity[b— BASEMENTJ; 
model[a-BASEMENT|.popularity = pop; 
model[room-BAS£MENT].door = a; ' 
if (floors > 2) 
{ 

blueprint(a,floors-l,atticstax,wtax); 
continue; 

} 

pop = 0.; 

for (b = attic; b > a; b— ) 

pop + = popularity[b-BASEMENTj; 
model[attic-BASEM£NT].popularity - pop; 
model[a-BASEMENT}.door = attic; 
modeI[attic-BASEMENT].door » BASEMENT; 
b = BASEMENT; 
iscap = 0.; 
pop = 0.; 

while ((b » model[b-BASEMENT].door) 1» 
{ 

iscap + " (e£Bciencyn>-BASEMENT] • 

modeip>-BASEMENT].popularit)r); 

if (pop < inodeI[b-BASEMENT].populariQr) 

pop B modeip>-BASEMENT].popiilari^ 

} 

iwcap = pop/iscap; 

if ((iv/cap > wcap) && (iscap < scap)) | | 
((iwcap > wcap - wtax) && (iscap < scap - stax))) 
{ 

scsq> = iscap; 
wcap — wcap; 
do 

bouse[b-BASEMENTl = model[b- 
while((b=modelIb-BASEMENT|.door)! =BASEMENT); 



40 
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The floors are next furnished and sorted by popularity— the most 
popular floor being addressed first. 
floorsort(BASEMENT); 



10 



15 



20 



25 



30 



35 



40 



floorsort(baseinent) 
long 

{ 

double 
long 



> 



basement; 

Ipopularity, rpopularity; 
Iroom, mroom, rroom; 



while ((Iroom = house[basement-BASEMENT).door) !» 

BASEMENT) 

Ipopularity » house[lroom-BASEMENT].popularity; 
mroom = iroom; 

noom = house[mroom-BASEMENT|xloor. 
if (rroom = « BASEMENT) 
break; 

ipopulari^ = house[n-oom-BASEMENT].popularity; 

while (Ipopularity > « ipopularity) 

{ 

mroom = rroom; 

rroom » h6use[mroom=BASBMENTl.door; 
if (rroom = = BASEMENT); 

break; 
rpopularity = 

house[rroom-BASEKfENT).popularity; 

} 

house[basement-BASEMENT].door = rroom; 
house[mroom-BASEMENT].door = 

house[rroom.BASEMENTl-door; 
house[rroom-BASEMENTJ.door - Iroom; 

} 

if (rroom = = BASEMENT) 
basement - Iroom; 



} 



From Figure 20 it can be seen that the first floor occupies rooms 25 
through 36, the second floor occupies rooms 20 through 24, and so on. For 
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instanoe, all pages which can congress to less than or equal to 36 ECC 

blodcs and . greater than 24 blocks are addressed in the range 0 through 

3232S632. The area maintained by a floor is oonq>uted as the total effective 

capacity of the device divided by the effective oqpacity of that-floor. 

S ecap B pcap / scap; 

leap = 0; 

room s BASEMENT; 

while ((room = house[(door=room)-BASEMENT].door) !- 
BASEMENT) 
10 { 

house[room-BASEMENT].capaaQr = 

(e£Bcien(y[room-BASEMENT] * 
house[room-BASEMENT].popularity); 

house[room-BASEMENT}.efficiency = 
15 efiSctency[room-BASEMENT]; 

house[room-BASEMENT].area « (long)roundoff((ecap * 
house[room-BASEMENT).popu]ari^), 
(double)((M096); 

Ic^ + » (double)bouse(room-BASEMQ4T].area; 
20 house[room-BASEMENT]iront 

house[door-BASEMENT]JQront -t- 
house[door-BASEMENT|.area; 

house(room-BASEMENT].bacIc « 

house[room-BASEMBNT|.front + 
25 house[room-BASEMENT).area - 

(long)4096; 

} 

The front/back address range marks the storage location in memory 
30 17 in 8-bit data bytes, and thus correlates exactly to a page aligned system 
address o&et When the system needs to address the memory 17, the 
driver must know the correct floor to assign since the efficiency and room 
number are needed to compute the address needed by the memory 
controller 57 hardware. In order to find the correct floor quickly, the driver 
35 m aintains a list of Screens which have been computed by dividing up the 
total logical capacity of the device among all the rooms and assigning the 
appropriate floor to those rooms based on the area consumed by the floor. 
Thus, a ^tem address is merely divided by the same divisor and used as 
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the index into the house. The screen value within that entry points to the 

correct floor. If however, the actual address is greater than the value of the 

back for that floor, then the room number is sinq>ly increased by one. 

repair « (leap - 4096) / (ATTIC-BASEMENT); 
door = house[BASEMENT-BASEMENTl.door; 
start = house[door-BASEMENT].front; 
for (room = BASEMENT-BASEMENT; room < ROOF- 
BASEMENT; 

room++) 



10 { 



15 > 



while (((room • repair) + start) > 

(house[door-BASEMENT].back)) 
door = house[door-BASEMENT].door; 

house[room].screen » door; 



A similar procedure is used whenever the initial transfer did not fit 
on the floor initially attempted. The con:q>ression mechanism 16 returns the 
actual compressed size in 9-bit ECC bytes. Therefore, the driver need only 
20 to divide the value by the size of an ECC block and index to the correct 

window via the result. The value of the window marks the start of the next 
aooommodating floor. 

windowsort(ROOF); 



25 windowsort(roof) 

long roof; 

{ 

long curtain; 

long window > BASEMENT; 

30 long room = BASEMENT; 

while ((room » house[room-BASEMENT].door) ! - 
BASEMENT) 

if (window = = BASEMENT) 
35 return; 

curtain - window; 

while (curtain > = BASEMENT) 

hou5e[curtain — BASEMENTJ.window - window; 
windowsort(window); 

40 } 
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In an alternative embodiment, if the hard disk 14 is used for storing 
uncompressible pages, all data pages sent via bus 12 for storage may be 
conditionally stored as uncompressed data on disk 14, while at the same 
time the page is being compressed in mechanism 26. After the compression 
5 operation has been completed then if the level of compression produces a 
page smaller than the 2800-byte limit or the 200(>-byte limit the page is 
stored again as compressed data in the partition 18 or 19 in the memory 17, 
and the just-stored page of uncompressed data on the disk 14 is invalidated. 
Invalidating the uncompressed page merely consists of resetting a bit a page 
10 address table in the controller IS to "empty. 

While this invention has been described with reference to specific 
embodiments, this description is not meant to be construed in a limiting 
sense. Various modifications of the disclosed embodiments, as well as other 
15 embodiments of the invention, will be apparent to persons skilled in the art 
upon reference to this description. It is therefore contemplated that the 
appended claims wfll cover any such modifications or embodiments as fall 
within the true scope of the invention. 
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Table K Truth Ubie for circuit of Figure 4. 
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Table 2. Truth table of circuit M| for calculator of 

Figure 7b. (Output qg - P4 ^ pg.) Circuit multiplies 
its input 9-tuple by o^. 
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Table 3a. Truth table of circuit H3 for S3 calculator of 
Figure ^ (Output qo - Pq ♦ Pi ♦ P4 ♦ Pfi-± Po-) 
Circuit nultlplies Its input 9-tuple by S". 
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Table 3b. Truth table of circuit Mq for S3 calculator of 
Figure 9. Output t5 - rg * r^. 
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Table 4. Steps Involved In conputing 
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Table 6. Cases under which an uncorrectable error 

Is detected by inconsistency between Sq and 
Number of Errors Corrected. 



BNSDOCID: <WO_9217844Al_L> 



wo 92/17844 



-42- 



PCT/US92/02364 

i 





1 


0 


0 


0 


0 


0 


0 


o 


0 


0 


1 


0 


0 


0 


0 


0 


o 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


X 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


u 


0 


0 


0 


0 


0 


0 


0 





1 



lOOOOlOOC 
010000100 
OOlOOOOlO 
000100001 
100010000 
010000000 
001000000 
000100000 
000010000 



^0 0 1000000 
000100000 
000010000 
100001000 
010000100 
000000010 
000000001 
100000000 
0 1000000 0^ 

«2 



0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


1 


0 


0 


0 


1 


0 


0 


0 


0 


1 


1 


0 


0 


0 


1 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


0 


0 


0 



a6 



100010001 
010001000 
001000100 
000100010 
OOOOlOOOl 
1000110 0 1 
01000 1100 
001000110 
0 00100011 

a502 



'O 0010 0000 
000010000 
100001000 
010000100 
001000010 
000000001 
lOOOOOOOO 
010000000 
OOlOOOOO 0^ 

.3 



^0 00010000 
100001000 
010000100 
001000010 
000100001 
lOOOOOOOO 
010000000 
00100000 0 
JD OOlOOOOO^ 

^4 





0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


1 


0 


0 


0 


1 


0 


0 


0 


0 


1 


1 


0 


0 


0 


1 


0 


0 


0 


0 


1 


1 


0 


0 


0 


1 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


Lo 


•1 


0 


0 


0 


0 


1 


0 





a7 



0 


0 


0 


1 


0 


0 


0 


0 


1 

1 


1 


0 


0 


0 


1 


0 


0 


0 


0 


1 


1 


0 


0 


0 


1 


0 


0 


0 


0 


1 


1 


0 


0 


0 


1 


0 


0 


0 


0 


1 


1 


0 


0 


0 


1 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


0 


0 




0 


1 


0 


0 


0 


0 


1 


0 



a8 



jable 5- Matrices for fixed-element multipliers 

for Mapper Circuit of Figure 10. For 



BNSDCXIO: <WO Q217a44A1 I > 



wo 92/1 7844 PCT/US92/02364 

-43- 



CLAIMS: 

1. A method of storing pages of compressed data in a semiconductor 
memory, the pages of data being of fixed size before compression, 

S comprising the steps of: 

a) partitioning the semiconductor memory to provide first and second 
memory spaces, each of said memory spaces containing a large number of 
memory locations, the memory locations of the first memory space being of 
a first size for storing one of said pages of data compressed to at most said 

10 first size, the memory locations of the second memory space being of a 

second size for storing one of said pages of data compressed to at most said 
second size; 

b) receiving and compressing each of said pages of data to produce a 
coiiq>ressed page, and detecting whether or not each said compressed page 

15 is as small as said first size or said second size; 

c) storing each of said pages of data in either said first or second 
memory space in response to said step of detecting, or, if said page is not 
compressed to as small as said first or second size, storing said page 
uncompressed in a third memory space. 

20 

2. A method according to claim 1 wherein said first and second 
memory spaces are larger than said third memory space. 

3. A method according to claim 2 wherein said fixed size is about 
25 twice as large as said first size, and wherein said second size is smaller than 

said fixed size and larger than said first size. 

4. A method according to claim 1 including the steps of providing a , 
separate address for each of said pages and storing said address in a table 

30 associating a location of a page in said first or second memory spaces. 
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5. A method according to claim 4 including the step of recalling one 
of said pages of data by 

d) receiving a request for said one of said pages using said address 
for said page; 

e) reading said page from said first or second memory space using 
said address; 

f) decompressing said page. 

6. A method according to claim 1 including the step of generating an 
ECC code for each of a plurality of blocks of each of said compressed 
pages, and wherein said step of storing includes storing said ECC codes with 
said compressed ;>ages. 

7. A method according to claim 6 wherein said compressed pages 
are transferred as parallel 9-bit symbols to said step of generating an ECC 
code, and wherein said blocks are transferred to said step of storing as 
parallel 9-bit symbols. 

8. A memoiy device for storing pages of data, the pages of data 
being of fixed size, comprising: 

a) a first memory space in said memory device containing a large 
number of memory locations of a first size, each of said memory locations 
of a first size storing a compressed version of one of said pages of data; 

b) a second memory space in said memoiy device containing a large 
number of memory locations of a second size larger than said first size but 
smaller than said fixed size, each of said memory locations of said second 
size storing a compressed version of one of said pages of data; 

c) means for compressing each of said pages of data to produce a 
compressed page, and means for detecting whether or not each said 
compressed page is as small as said first size or if not then as small as said 
second size; 
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d) and means for storing each of said pages of data in either said 
first memory space or said second memory space in response to said means 
for detecting. 

5 9. A memory device according to claim 8 wherein said first and 

second memory spaces are defined in semiconductor memory. 

10. A memory device according to claim 9 wherein said fixed size is 
about twice as large as said first size; and wherein said second size is about 

10 70% as large as said fixed size. 

11. A memory device according to claim 8 including means for 
receiving a separate address with each of said pages and storing said address 
in said memory device associated with a location of said page in said first or 

IS second memory spaces. 

12. A memoiy device according to claim 11 including means for 
recalling said data pages, including: 

e) means for receiving a request for one of said pages including said 
20 address for said page; 

f) means for finding said address for said page in said stored 
addresses and determining the location of said page; 

g) means for detecting whether or not said page was stored in said 
first or second memory space, and, if so, decompressing said page. 

25 

13. A memory device according to claim 8 including means for 
generating an ECC code for each of a plurality of blocks of each of said 
compressed pages, and wherein said means for storing stores said ECC 
codes with said compressed pages; and including means for transferring said 

30 compressed pages as parallel 9-bit symbols to said means for generating an 
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ECC code, and wherein said blocks are transferred to said means for storing 
as parallel 9-bit symbols. 

14. A memoiy device according to claim 8 wherein said means for 

5 compressing includes a lookahead buffer number of bytes of incoming data, 
and a window buffer containing a larger number of bytes of recent incoming 
data, and means for comparing the bytes in said lookahead buffer to all of 
the bytes in said window buffer and generating match symbols if multiple- 
byte matches are found in said comparing. 

10 

15. A memory device according to claim 14 wherein said lookahead 
buffer and said window buffer are of bit-parallel format, and the output of 
said means for compressing includes an added bit that indicates whether or 
not an output of the means for compressing represents a compressed or 

15 non-compressed symbol. 

16. A method of storing page-swap data in a virtual memoiy system, 
comprising the steps of: 

storing in a semiconductor memoiy unit a large number of 
20 swap pages either (a) compressed to a level no more than a first value and 
stored in a first area of said memoiy, or (b) compressed to a level no more 
than a second value but greater than said first value and stored in a second 
area of said memoiy, or (c) if not compressible to said second value or less 
then stored uncompressed in a third area of said memoiy; 
25 recording the number of pages stored in each of said first, 

second and third areas; 

partitioning said memoiy in response to said recorded 
numbers to provide boundaries between said first, second and third areas 
defined by addresses, to thereby again store said page-swap data in a 
30 minimum of space in said memory; each said partitions defining memory 
blocks of said first value, said second value or the size of said pages. 
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n. A method according to claim 16 wherein said first and second 
memoiy areas are larger than said third memory area; and wherein said 
pages are of a fixed size about twice as large as said first value, and wherein 
said second value is smaller than said fixed size and larger than said first 
value, 

18. A method according to claim 16 including the steps of providing 
a separate address for each of said pages and storing said address in a table 
associating a location of a page in said first or second memory areas; and 
further including the step of recalling one of said pages of data by 

a) receiving a request for said one of said pages using said address 
for said page; 

b) reading said page from said first or second memoiy areas using 
said address; 

c) decompressing said page. 

19. A method according to daim 16 including the step of generating 
an ECC code for each of a plurality of blocks of each of said conq>ressed 
pages, and wherein said step of storing includes storing said ECX^ codes with 
said compressed pages. 

20. A method according to claim 19 wherein said compressed pages 
are tramferred as parallel 9*bit symbols to said step of generating an ECC 
code, and wherein said blocks are transferred to said step of storing as 
parallel 9-bit symbols. 

21. A method of storing blocks of data in a memoiy device, the 
blocks of data being of fixed size, comprising the steps of: 

a) partitioning the memory device to provide a first memory space 
containing a large number of memory locations of a first size for storing a 
compressed version of one of said blocks of data, and to provide a second 
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memory space containing a large number of memory locations of said fixed 
size for jstoring an uncompressed version of one of said blocks of data; 

b) receiving and compressing each of said blocks of data to produce 
a compressed block, and detecting whether or not each said compressed 

5 block is as small as said first size; 

c) storing each of said blocks of data in either said first memoiy 
space or said second memoiy space in response to said step of detecting. 

22. A method according to claim 21 wherein said memoiy device is 
10 a magnetic disk. 

23. A method according to claim 22 wherein said first memory space 
is a first partition of said disk and said second memory space is a second 
partition of said disk. 

15 

24. A method according to claim 23 wherein said first partition is 
much larger than said second partition. 

25. A method according to claim 24 wherein said fixed size is about 
20 twice as large as said first size. 

26. A method according to claim 21 including the steps of receiving 
a separate address with each of said blocks and storing said address in said 
memoiy device associated with a location of said block in said first or 

25 second memory spaces. 

27. A method according to claim 26 including the step of recalling 
said data blocks by 

d) receiving a request for one of said blocks using said address for 
30 said block; 
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e) finding said address for said block in said stored addresses and 
determining the location of said block; 

f) detecting whether or not said block was stored in said first memory 
space; and 

S g) decompressing said block if stored in said first memory space. 

28. A memory device for storing blocks of data» the blocks of data 
being of fixed size» comprising: 

a) a first memory space in said memory device containing a large 
10 number of memory locations of a first size, each of said memoiy locations 

of a first size storing a compressed version of one of said blocks of data; 

b) a second memory space in said memory device contai nin g a large 
number of memory locations of said fixed size, each of said memoiy 
locations of said fixed size storing an uncompressed version of one of said 

15 blocks of data; 

c) means for compressing each of said blocks of data to produce a 
compressed block, and means for detecting whether or not each said 
compressed block is as small as said first size; 

d) and means for storing each of said blocks of data in either said 
20 first memory space or said second memory space in response to said means 

for detecting. 

29. A memory device according to claim 28 wherein said first and 
second memory spaces are defined in a magnetic disk. 

25 

30. A memory device according to claim 29 wherein said first 
memory space is much larger than said second memory space. 

31. A memory device according to claim 30 wherein said fixed size is 
30 about twice as large as said first size. 



BNSDOCID: <WO 9217844A1_L> 



wo 92/17844 PCr/US92/02364 

-50. 



32. A memory device according to claim 28 including means for 
receiving a separate address with each of said blocks and storing said 
address in said memory device associated with a location of said block in 
said first or second memory spaces. 

5 

33. A memory device according to claim 32 including means for 
recalling said data blocks, including: 

e) means for receiving a request for one of said blocks including said 
address for said block; 
10 f) means for finding said address for said block in said stored 

addresses and determining the location of said block; 

g) means for detecting whether or not said block was stored in said 
first memory space; and 

h) means for decompressing said block if stored in said first memory 

15 space. 

34. A digital system comprising: 

a) a source of data blocks of fixed length; 

b) storage means having first storage locations of a selected size less 
20 than said fixed length and second storage locations of said fixed length; 

c) a data conq^ression unit receiving said data blocks from said 
source and compressing each of said data blocks to produce a compressed 
block; 

d) and means for storing a compressed block in one of said first 
25 storage locations if said compressed block is as small as said selected size, 

or for storing data for one of said data blocks in one of said second 
locations if said compressed block is larger than said selected size. 

35. A system according to claim 34 wherein said first and second 
30 storage locations are defined in a magnetic disk. 
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36. A system according to claim 3S wherein said storage locations 
occupy a first memory space which is much larger than a second memory 
space occupied by said second storage locations. 

5 37. A system according to claim 36 wherein said fixed size is about 

twice as large as said selected size. 

38. A system according to claim 34 including means for receiving a 
separate address with each of said blocks and storing said address in said 

10 storage means associated with one of said first or second storage locations. 

39. A system according to claim 38 including means for recalling 
said data blocks, including: 

e) means for receiving a request for one of said blocks including said 
IS address for said block; 

f) means for finding said address for said block in said stored 
addresses and determining the location of said block; 

g) means for detecting whether or not said block was stored in one of 
said first storage locations; and 

20 h) means for decompressing said block if stored in one of said first 

storage locations. 

40. A system according to claim 34 wherein said storage means 
includes third storage locations of a third size less than said selected size; 

25 and said means for storing stores a compressed block in said third storage 
location if said compressed block is as small as said third size. 

41. A system according to claim 34 wherein said first storage 
locations are defined in semicoonductor memory and said second storage 

30 locations are defined in a magnetic disk. 
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42. A memory system storing blocks of data of fixed size, 
comprisixig: 

a) a first memory space containing memory locations of a first size, 
each of said memory locations of a first size storing a compreissed version of 

5 one of said blocks of data; 

b) a second memory space containiog memory locations of said fixed 
size, eadi of said memory locations of said fixed size storing an 
uncompressed version of one of said blocks of data; 

c) a third memory space containing a table of the locations of blocks 
10 of data in said first and second memory spaces; 

d) means for receiving a request for one of said blocks including an 
identification of said block; 

f) means for finding said identification in said third memory space 
and determining the location of said block in said first or second memory 

15 space; 

g) means for detecting if said block was stored in said fiirst memory 
space, and if so decompressing said block. 

43. A memory device according to claim 42 wherein said first and 
20 second memory spaces are defined, in a magnetic disk. 

44. A memory device according to claim 42 wherein said , first 
memory space is constructed of semiconductor memory devices and said 
second memory space is defined in a magnetic disk device. 

25 

45. A memory device according to claim 42 wherein said first and 
second memory spaces are defined in semiconductor memory devices of the 
non-volatile type. 
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